Dynamic Bayesian Networks for Transliteration Discovery and Generation
نویسنده
چکیده
This project is involved with extraction and transliteration of entity names between languages that use different writing systems (or alphabets). Extraction involves the automatic identification of sequences in parallel/comparable corpora/text that can be considered as proper entity names. On the other hand, transliteration generation involves automatic transformation of a source language name to a target language name across different writing systems while ensuring that pronunciation is maintained during and after the process. Bilingual entity name recognition and / or transliteration are processes aimed at improving performance in various Natural Language Processing (NLP) applications including: Machine Translation (MT), Cross Language Information Retrieval (CLIR), and Cross Language Information Extraction (CLIE). As an example, in recognition, there has been a growing interest in recognizing variations for the same entity name across different languages (Hsu et al. in both the source language that uses the Cyrillic alphabet and target languages that use the Roman alphabet. The presence of variations for the same entity may lead to incomplete search results and cause communication problems in a larger community of language users (Hsu et al., 2007). In medication, identification of drug names that look similar has been Source in Russian that uses the Cyrillic alphabet Transliterations in languages that use the Roman alphabet Table 1.1 Example illustrating variations for the same name both its source language and languages to which it was transliterated (Source: NewsExplorer Website 1) 1 NewsExplorer website can be accessed via Mbale is a town on the slopes of Mountain Elgon in Eastern Uganda Мбале является городом на склонах горных Elgon в Восточной Уганде Table 1.2: Example illustrating translation of a phrase with an Out of Vocabulary word found to be very helpful in reducing drug prescription errors (Kondrak and Dorr, 2004). To stress the need for machine transliteration, table 1.2 shows a problem where a machine translation system (Google 2 translate engine) encounters a new entity name " Elgon " in English for which it can not translate in a target language (Russian). Such a situation arises because the translation system does not have this word in its translation dictionary or lexicon. Machine transliteration is one of the best approaches to help deal with Out Of Vocabulary (OOV) problems. This report proposes work in the framework of Dynamic Bayesian Networks (DBNs) in which various model spaces are investigated with the aim of improving entity name recognition and machine transliteration across different writing …
منابع مشابه
Mining Transliterations from Wikipedia using Dynamic Bayesian Networks
Transliteration mining is aimed at building high quality multi-lingual named entity (NE) lexicons for improving performance in various Natural Language Processing (NLP) tasks including Machine Translation (MT) and Cross Language Information Retrieval (CLIR). In this paper, we apply two Dynamic Bayesian network (DBN)-based edit distance (ED) approaches in mining transliteration pairs from Wikipe...
متن کاملA Bayesian model of bilingual segmentation for transliteration
In this paper we propose a novel Bayesian model for unsupervised bilingual character sequence segmentation of corpora for transliteration. The system is based on a Dirichlet process model trained using Bayesian inference through blocked Gibbs sampling implemented using an efficient forward filtering/backward sampling dynamic programming algorithm. The Bayesian approach is able to overcome the o...
متن کاملApplying a Dynamic Bayesian Network Framework to Transliteration Identification
Identification of transliterations is aimed at enriching multilingual lexicons and improving performance in various Natural Language Processing (NLP) applications including Cross Language Information Retrieval (CLIR) and Machine Translation (MT). This paper describes work aimed at using the widely applied graphical models approach of ‘Dynamic Bayesian Networks (DBNs) to transliteration identifi...
متن کاملEvaluation of Dynamic Bayesian Network models for Entity Name Transliteration
This paper proposes an evaluation of DBN models so as to identify DBN configurations that can improve machine transliteration accuracy.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009